A preoperative magnetic resonance imaging-based model to predict biochemical failure after radical prostatectomy

To investigate if a magnetic resonance imaging (MRI)-based model reduced postoperative biochemical failure (BF) incidence in patients with prostate cancer (PCa). From June 2018 to January 2020, we retrospectively analyzed 967 patients who underwent prostate bi-parametric MRI and radical prostatectomy (RP). After inclusion criteria were applied, 446 patients were randomized into research (n = 335) and validation cohorts (n = 111) at a 3:1 ratio. In addition to clinical variables, MRI models also included MRI parameters. The area under the curve (AUC) of receiver operating characteristic and decision curves were analyzed. The risk of postoperative BF, defined as persistently high or re-elevated prostate serum antigen (PSA) levels in patients with PCa with no clinical recurrence. In the research (age 69 [63–74] years) and validation cohorts (age 69 [64–74] years), the postoperative BF incidence was 22.39% and 27.02%, respectively. In the research cohort, the AUC of baseline and MRI models was 0.780 and 0.857, respectively, with a significant difference (P < 0.05). Validation cohort results were consistent (0.753 vs. 0.865, P < 0.05). At a 20% risk threshold, the false positive rate in the MRI model was lower when compared with the baseline model (31% [95% confidence interval (CI): 9–39%] vs. 44% [95% CI: 15–64%]), with the true positive rate only decreasing by a little (83% [95% CI: 63–94%] vs. 87% [95% CI: 75–100%]). 32 of 100 RPs can been performed, with no raise in quantity of patients with missed BF. We developed and verified a MRI-based model to predict BF incidence in patients after RP using preoperative clinical and MRI-related variables. This model could be used in clinical settings.

www.nature.com/scientificreports/ tional and peripheral zones, respectively 13 . The prostate volume was measured by bpMRI. All lesions were evaluated by senior personnel using PI-RADSv2.1 scores. The prostate MRI regional model was defined using the following four-zone method. To trisect the prostate along its axis, the lower third was defined as the apex zone while the upper third was the basal zone. The middle third was further divided into peripheral and non-peripheral zones. According to the four-zone method, a positive zone was defined as the major part of the lesion located or a lesion involved more than half of the zone. Therefore, patients with multiple lesions may also have multiple positive zones. Also, extracapsular extension (EPE) and seminal vesicle invasion (SVI) indices were recorded.
Prediction model design. The baseline model embodies commonly used clinical variables comprising age at biopsy, body mass index (BMI), PSA at diagnosis, PSA density, suspicious digital rectal examination (DRE) (yes/no), biopsy pathology (ISUP grade), and surgical technique type (Robot-Assisted Radical Prostatectomy or Laparoscopic Radical Prostatectomy). The MRI model included these predictors, plus PI-RADS scores (1, 2, 3, 4, and 5), EPE at bpMRI (yes/no), SVI at bpMRI (yes/no), the zonal location of suspected lesions (apex region, basal region, central peripheral zone, and central non-peripheral zone), maximum diameter of the suspected lesion, and clinical stage (T1, T2, and/or T3). The outcome was BF. Postoperative PSA levels were initially measured at 1-2 months after RP, then at 3 month intervals in the second year, and intervals exceeding 6 months were deemed lost to follow-up.
Statistical analysis. We developed and validated two multivariable logistic regression models to predict BF after RP. We recalibrated the risk model in the validation cohort by matching logistic regression with the logit of the predictive risk 14 . A calibration slope near 1 indicated the correct predictive model fitting. The diagnostic correctness of both models was surveyed and balanced by the area under the curve (AUC) of the receiver operating characteristic (ROC). Model fitting was evaluated using calibration plots 14 . False positive rates (FPR) and true positive rates (TPR) were used to evaluate the prediction accuracy of postoperative BF. The TPR indicated the ratio of patients with BF above the threshold, while FPR indicated the proportion of patients with non-BF values above the same threshold. The clinical value of the prediction model was weighed using the ratio of avoided BFs, the net benefit (NB), and a net reduction (NR) in false positives (FPs) 15 . We analyzed 95% confidence interval (CI) and SE values of prediction ability estimator in every predictive models, and the difference between the two models which from 2000 samples by stochastically selecting patients with substitution. We readjusted the prediction model and recalculated the prediction risk of every model in every sample in the research cohort. The 95% CIs came from 2.5% and 97.5% of the re-sampling distribution. Data for the resampling process included outcome (whether there was postoperative BF) and the unregulated predicted risk analyzed according to every risk models in the validation cohort. In every sample, the simple model for recalibration was readjusted, and then the predicted risk after calibration was recalculated. We compared variable distributions between research and validation cohorts. Categorical variables were assessed using χ 2 tests, and we used Wilcoxon tests to analyze continuous variables. These tests were bilateral and a P < 0.05 value indicated statistical significance.
Ethical approval and consent to participate. All methods were performed in accordance with relevant guidelines and regulations. This retrospective study received ethical approval from the Hospital Ethics Committee of the First Affiliated Hospital of Nanjing Medical University. Written informed consent was obtained from all subjects.

Results
Study population. In accordance with our exclusion criteria, we finally selected 446 consecutive patients.
Then, we randomly divided 335 patients into the research cohort and 111 patients into the validation cohort, and both separately included in the model. Patient demographics in both cohorts are shown in The prediction model. In the baseline model, PSA, GG3, GG4, and GG5 were independent predictors in terms of clinical variables, with statistical significance in the MRI model ( Table 3). The risk for BF was positively associated with PSA and increased with GG3, GG4, GG5, and lesion in the central peripheral zone. In research and validation cohorts, the calibration plot showed that the MRI model demonstrated a better fit when compared with the baseline model ( Fig. 1).
When compared with the baseline model, the AUC increased from 0.780 to 0.857 (P < 0.05) in the MRI model in the research cohort ( Fig. 2A and Table 4). In the validation cohort, when compared with the baseline model, the AUC increased from 0.753 to 0.865 (P < 0.05) ( Fig. 3A and Table 5).
TPR and FPR values in models are shown in Fig. 2B for the research cohort. TPR and FPR values in calibrated risk models (Table 4) are shown in Table 5 and Fig. 3B for the validation cohort. The FPR of the MRI model was lower when compared with the baseline model, and the loss of TPR was the smallest. www.nature.com/scientificreports/  www.nature.com/scientificreports/  Fig. 3C, D showed the NBs and NRs in the quantity of FPs for the validation cohort. We then applied the MRI model to the validation cohort. When compared with "treat all" and "treat none" methods ("all model" and "none model"), the NB of risk thresholds ≥ 15% was always higher for all models (Figs. 2C and 3C). For instance, at a 20% risk cut-off, the NB was 3 (95% CI: 0-9) in both models, 14 (95% CI: 7-23) in the baseline model, and 18 (95% CI: 11-28) in the MRI model, and the NR in the quantity of FPs was 0 in the "all model (treat all)", 19 (95% CI: 6-37) in the baseline model, and 32 (95% CI: 0-56) in the MRI model. The NB of the MRI model was identical to 18 BFs/100 men with-out negative BFs, four more than the baseline model. When compared with BFs in all patients with positive MRI results, the NR in the quantity of FPs based on the MRI model was equivalent to 32 fewer false BFs/100 men, while the quantity of undiagnosed BFs did not increase. Overall, 66% (95% CI: 53%-90%) of "treat all" could be avoided, while 83% (95% CI: 63%-94%) of postoperative BFs were identified. In contrast, the baseline model avoided 53% (95% CI: 33%-76%) of "total treatment" at this threshold, and identified 87% (95% CI: 75%-100%) of postoperative BFs under this threshold.

Discussion
With the emergence of different treatments for localized PCa, the preoperative risk stratification of PCa patients is extremely important. BF is an ideal early prognostic PCa predictor after RP. A previous study reported that BF occurred when tumor tissue residue at surgery (i.e., positive margin and/or subclinical lymphatic metastasis) or cancer had disseminated beyond the prostate and outside the surgical field at surgery (i.e., minimal residual disease) 16,17 . Several commonly used multivariate risk tools based on pre-diagnosed PSA, T stage by DRE, and biopsy grading group categories have been used to predict postoperative PSA results 18,19 . Several studies reported that MRI-derived parameters in a risk model increased the accuracy of BCR prediction. For example, a multivariable model including MRI PIRADS, along with clinical and pathological variables, outperformed European Association of Urology classification and CAPRA scores for predicting BCR (C-index: 77% vs. 62% vs. 60%, respectively) 20 . Moreover, in another study 8 , a pre-surgical model incorporating PI-RADS, fusion-targeted biopsy grade, and extraprostatic extension on MRI showed better accuracy in predicting BCR (AUC = 0.68-0.71) when compared with the D' Amico classification (AUC = 0.66-0.71). However, these findings used BR as the endpoint, and persistent PSA levels (> 0.2 ng/ml) after RP also required preoperative intervention. In Soga et al., three sub-groups were defined in terms of the D' Amico classification risk (low, intermediate, and high) and the GP score (Gleason score multiplied by PSA). No significant difference was observed in the non-BF rate between low risk and low GP score subgroups or intermediate risk and intermediate GP score subgroups. But the non-BCF rate of the high GP score subgroup was significantly lower when compared with the high-risk subgroup (42.1% vs. 66.1%, P = 0.008). Based on multivariate analyses, a high GP score (P = 0.001; Hazard ratio (HR): 3.78; 95% CI: 1.95-7.35) was a significant independent risk factor for BCF after prostatectomy. However, these prediction models were limited to clinical parameters 21 . In previous studies, Teloken et al., reported that transition zone location indicated a better BR-free survival after adjusting for poor clinicopathological features 22 . Shin et al., showed the zonal location of lesions by MRI, and in addition to the PI-RADS category, this was putatively helpful estimating postoperative BF risks 9 . These studies confirmed the role of MRI in predicting BF, but they did not develop prediction models. When MRI parameters were included in our prediction model, we identified better model fitting and a higher diagnostic accuracy, avoided more BFs, and maintained a similar level of sensitivity to BFs in contrast with the baseline model.
We used DCA in both risk prediction models to compare the NBs of "treat none" with "treat all". "Treat none" refers to RP for localized PCa, while "treat all" refers to neoadjuvant androgen deprivation, extended radical operation, and lymph node dissection. In clinical settings, the risk threshold of "treat all" may be determined after physicians and patients weigh and judge the relative hazards of aggressive treatment regimen and the benefits of determining postoperative BFs. So, there was no one risk threshold in deciding who demanded RP, but a series of risk thresholds. Because of higher adverse-effect profiles and the disputed curative effects of "treat all", we selected high risk thresholds for our DCA. Our novel MRI model also demonstrated better calibration characteristics and higher NBs when compared with the baseline model. Our DCA data indicated that when index lesion locations on bpMRI were included in the prediction model, it showed better model fitting and a

Study limitations
Our model data were similar to previous data. However, our study had several limitations; it was a retrospective, single center data study, and was internally validated. In addition, this study was based on bpMRI, which may have some bias compared with multi-parameter MRI. These factors may have caused some verification bias and the data may not be universally applied 23 .

Conclusions
Using preoperative clinical and MRI-related variables, we developed and verified a MRI-based prediction model which predicted BF incidence in patients after RP. This model could be helpful in clinical settings. www.nature.com/scientificreports/

Data availability
All data generated or analyzed in this study are included in the published article and its supplementary information files.